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1. INTRODUCTION 

Drinking water treatment systems in aqueducts involve a chemical dosing process to control the key 
parameters that guarantee the quality of treated water; these processes present a correlation between their 
different variables, making it incredibly complex to set classical control strategies. Proportional, integral and 
derivative (PID) controllers and their variants, which are part of classical control, present many advantages at 
an industrial level, as explained Yu ef al. [1] Abdullah and Ali [2]. However, they apply well to linear 
systems, whose models are feasible to establish. For systems that do not meet this condition, there are control 
strategies based on artificial intelligence [3], [4], called intelligent control techniques [5]. 

Intelligent control also allows to replicate PID strategies as [6] and [7]; however, it is more complex 
to implement. Therefore, it is used with non-linear plants in prediction systems presented Mohamed ef al. [8] 
or satellite dynamic altitude control systems [9]. Intelligent control applications cover vast and diverse fields 
from academic settings to industrial processes such as those in [10]-[16]. Neural controllers have even been 
developed for security management in residential homes [17]. This has allowed systems as complex and 
delicate as water treatment systems to also choose intelligent control techniques [18]—[20]. In [21], [22], a 
recent review of state of the art is exposed where they highlight the use of neural networks as an essential 
technique in water treatment, where one of the critical parameters of the process is the control of pH as stated 
in [23]-[25]. 
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At the ariari regional aqueduct (ARA) in Meta Colombia, it is desired to adjust the chemical dosing 
process in the treatment plant. pH is a critical factor in meeting national regulations because the high 
correlation between the intervening variables affects it. It is observed that the advantages of intelligent 
control based on neural networks can contribute to automating this process, which due to its complexity, has 
been carried out manually with the technique known as jar tests [26], which lasts between 20-30 min., 
making adjustment difficult. 

This article is composed of four sections that expose the development of the work done. The first 
section is the present introduction. In section 2, the methodology where dataset and network architecture are 
shown. Section 3 presents the analysis of results. Finally, section 4 exposes the conclusions drawn. 


2. METHOD 

At the dosage required for drinking water treatment, surface water catchments and raw water 
physicochemical parameters can change suddenly. Which affects water quality if proper adjustments are not 
made to the process chemistry. Figure 1 shows the main stages that are an integral part of the development in 
the intelligent control model, which will be discussed in next sub-sections. 


DATA COLLECTION 


DATA 
PRE-PROCESSING 


MODEL SELECTION 


MODEL EVALUATION 


MODEL VALIDATION 


Figure 1. Intelligent control flow chart 


2.1. Data collection and pre-processing 

In the drinking water treatment process is essential to make adjustments to pH stabilizer and 
coagulant immediately when variations occur to guarantee the water quality. For data collection, within the 
analysis of chemical dosing, the following essential input variables were identified: color, turbidity, pH, 
amount of qualified dosed, and amount of Aluminum sulfate type A dosed, and as output variables, color, 
turbidity, and pH. Empresa de servicios ptblicos del meta (EDESA S.A. E.S.P.) in ARA provided a set of 
720 jar analysis reports that comprised 488 data vectors. 

After the data collection stage, the data processing is started by performing the format assignment in 
the Excel database. Here, the number format was selected to facilitate loading the data to the modeling 
process later. The data preprocessing phase was about identification. As a first instance of the amount of lost 
or missing data, these incomplete or empty data sets are eliminated from the general dataset since these 
"empty" data affect the behavior of the data machine learning models. 

The data collected for the variables of turbidity, color, pH, lime, and Aluminum Sulfate type A, 
contain values within a wide range and not uniform with each other, which is inherent to the process. Still, 
the learning model's training process automatically generates inconveniences of data dispersion that increase 
convergence times of the algorithm and sometimes give poor results. Therefore, scaling converts the data to a 
uniform range, normalizing original data to a range between 0 and 1; this is performed by maximums and 
minimums (1) without affecting the dataset and maintaining the proportionality of each data. As an example, 
Figure 2 shows the graph of the preprocessing for the variable turbidity (input), on the left initial purified 
data and on the right normalized. 


‘ X-Xmin 
Xnormalized = 


(1) 


Xmax-—Xmin 
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Unnormalized input turbidity 
Normalized input turbidity 


- Samples 


Figure 2. Unnormalized and normalized input turbidity data 


For training of neural networks, percentage distribution of the total data sample (488) was defined as 
in Table 1. The database is distributed in 3 sets for training and validation. Each sample corresponds to the 
tabulated information of the manual jar testing process. 


Table 1. Distribution of samples in the training, testing and validation stages 


Stage Data percentage Number of samples 
Training 10% 342 
Testing 15% 73 
Validation 15% 73 


2.1. Neuronal arquitecture 

Chemical dosing is a stage with information on the values of the input variables and output variables 
of interest, so this problem must be adjusted to a supervised type of machine learning model, such as artificial 
neural networks. It is essential to clarify that the chemical dosing model is carried out in two stages. First, 
both pH turbidity and color are measured to adjust and determine lime and aluminum sulfate's initial level. 
The second stage network takes as input data the turbidity, color, pH, lime, and Aluminum Sulfate. As output 
variables, it takes the turbidity, color, and pH, so the neural architecture of the model to be implemented will 
follow the same dosage form. That is, two neural architectures, one model for each stage shown in Figure 3. 
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Figure 3. Neural network model 


For the training phase of the neural network model, the input and output relationship of each model 
is straightforward. It allows to directly establish the number of inputs and outputs of the network through an 
iterative process the number of neurons in the layer is determined hidden. Figure 4 shows a general 
representation of the proposed models with internal architectures of the neural networks of models 1 (left) 
and 2 (right), respectively. The architecture of the neural network model 1 had 20 (twenty) neurons in the 
hidden layer, which is twice that of model 2, due to the amount and complexity of data used for the training 
process. The output of model 1 becomes inputs for model 2 according to the manual dosing process used. 


Hidden Layer Output Layer Hidden Layer Output Layer 
Input Output Input Output 


20 2 


Figure 4. Neural network architecture of model | y 2 
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The number of neurons in the hidden layer of each model was selected by iterating five by five until 
the best possible approximation was obtained. Although the greater the number of neurons in the hidden 
layer, the greater the time required for training, validation, and testing of the model, a high computational 
load was not presented. A fixed number of epochs was used for each training trial, set at 1000 epochs in both 
neural network models. 

Figures 5 and 6 show the training, validation and testing of each neural network model. The value of 
R indicates the relationship obtained between output data and the target value, a value of R= 1 indicates a 
perfect fit, for the case under study, in model | the value of R was 0.81325 is a higher value. In model 2, the 
value of R was 0.96158, which also shows an excellent relationship between the values output and 
objectives. 
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Figure 5. Neural network training for model 1, view of the regression graphs 
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Figure 6. Neural network training for model 1, view of the regression graphs 
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For the phase of evaluation (model 1 and model 2), a dataset equivalent to 15% of the total available 
data was reserved. In Figure 7, the error is plotted against each training cycle of model 1. It indicates that the 
best performance was obtained at time 394 with an error of 0.022209, and the smaller the mean square error, 
the more approximate they are predicted and observed values. 
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Figure 7. Neural network training performance results for model 1 


Figure 8 shows the error against each training cycle of model 2 and indicates that the best 
performance was obtained in epoch 197 with an error of 0.0017541. The smaller the mean square error 
(MSE), the more approximate are the predicted and observed values. The low level of error obtained allows 
to validate this architecture for process automation tests. 
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Figure 8. Neural network training performance results for model 2 


3. RESULTS AND DISCUSSION 

The algorithm delivers the results Figure 9 from the jar test in the machine learning model 
developed to calculate the dosage of lime and sulfate to treat drinking water at ARA. In other words, it is 
possible to determine the discharge of lime and sulfate required to adjust the pH level of the treated water. 
The neural model seeks to replicate the appropriate dosage levels according to the standard, according to the 
parameterization of the network. 
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Lime = 19.4984726 ppm 

Sulfate = 47.9535313 ppm 

Color = 10.8484245 PCU 
Turbidity = 1.4653494 NTU 

pH = 5.8932976 

Flow = 180.00 I/s 

Lime discharge = 210.58 g/min 
Sulfate discharge = 517.90 g/min 


Figure 9. Output of machine learning model 


The Data from five of the jar tests performed by the plant operators were used to compare the 
response of the registered nurse (RN) neural network of the machine learning model with the manual one of 
the jar tests. The results are shown in Table 2. According to Colombian regulation, the water parameters for 
human consumption are color 0-15 platinum cobalt units (PCU), turbidity of 0-2 nephelometric turbidity unit 
(NTU), and pH between 6.5 and 9.0. Analysis of Table 2 shows that the values of the dosage of lime and 
sulfate obtained through RN are lower than those applied by the operators. Still, these generate a color and 
turbidity output within the norm and equal or improve the pH value. However, the parameter remains below 
the norm's values, an aspect that is adjusted by placing an offset to the minimum value obtained in predicting 
the neural model. 


Table 2. Comparison of test results of jugs against neuronal network 


2 Test 
Maniable 1 2 3 4 5 6 7 8 9 10 

Output Data Color (Jar Test) 10.5 14.9 13.6 6.8 47 20.1 9.5 9.8 7 14.1 

Color (Jar Test) 10.83 1086 1083 10.72 7.62 1787 1835 11.24 8.48 11.87 

Theindity tee 1 1.9 0.6 0.3 0.7 14 17 0.8 0.9 2.9 

Test) 

Turbidity 

(Neuronal Net) 1.67 1.69 1.53 1.49 1.4 2.18 23 1.56 1.41 1.92 

pH (Jar Test) 6 6.1 53 5 4.8 74 53 6 5.5 6.2 

aa 5.99 6 5.92 5.89 5.93 5.79 6.15 6 6.03 5.94 

Calonnn oxide 18.85 1943 1949 19.33 17.64 18.26 1401 1996 18.24 19.22 

(lime) (ppm) 

ae Sulfate 4653 4732 4795 4781 4456 4384 3246 4923 45.88 44.54 

Dosage Calcium oxide 


gO a Oj 
Reduction (%) (lime) (ppm) 5.75% 11.68% 2.55% 7.95% 11.80% 23.92% 29.95% 4.95% 17.09% 3.90% 


Aluminum Sulfate 


(ppm) 6.94% 9.00% 4.10% 4.38% 10.88% 12.32% 37.58% 5.33% 11.77% 14.35% 


4. CONCLUSION 

The performance of the algorithms of artificial intelligence in solutions-oriented to the industry was 
demonstrated with the coagulant and pH control system for water treatment developed through a neural 
model. The optimal dosage of sulfate and lime obtained by this method generated an output pH lower than 
7.5 and output turbidity lower than 8 NTU. The treatment plant's output presents low pH problems, as could 
be evidenced in the data from the jar tests obtained since they are below the range suitable for human 
consumption. For this reason, the predictive model created from this data optimizes and standardizes the 
chemistry of the process. Still, it is necessary to correct pH in the jar tests and thus be able to update the 
database and retrain the model to correct output error. Then, it is conclusive that to make more efficient 
models that meet the requirements of drinking water treatment, more effective characterization data must be 
obtained. 
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